Applying Meaningful Word-Pair Identifier to the Chinese Syllable-to-Word Conversion Problem
نویسندگان
چکیده
Syllable-to-word (STW) conversion is a frequently used Chinese input method that is fundamental to syllable/speech understanding. The two major problems with STW conversion are the segmentation of syllable input and the ambiguities caused by homonyms. This paper describes a meaningful word-pair (MWP) identifier that can be used to resolve homonym/segmentation ambiguities and perform STW conversion effectively for Chinese language texts. It is designed as a support system with Chinese input systems. In this paper, five types of meaningful word-pairs are investigated, namely: noun-verb (NV), noun-noun (NN), verb-verb (VV), adjective-noun (AN) and adverb-verb (DV). The pre-collected datasets of meaningful word-pairs are based on our previous work auto-generation of NVEF knowledge in Chinese [Tsai et al. 2003a and 2004], where NVEF stands for noun-verb event frame. The main purpose of this study is to illustrate that a hybrid approach of combining statistical language modeling (SLM) with contextual information, such as meaningful word-pairs, is effective for syllable-to-word conversion and is important for syllable/speech understanding. Our experiments show the following: (1) the MWP identifier achieves tonal (syllables with four tones) and toneless (syllables without four tones) STW accuracies of 98.69% and 90.7%, respectively, among the identified word-pairs for the test syllables; (2) by STW error analysis, we find that the major critical problem of tonal STW systems is the failure of homonym disambiguation (52%), while that of toneless STW systems is inadequate syllable segmentation (48%); (3) by applying the MWP identifier, together with an optimized bigram model and the Microsoft input method editor (MSIME 2003), the tonal/toneless STW accuracies of the two STW systems can be improved from 96.27%/85.47% to 96.75%/87.74% and from 95.05%/86.94% to 96.30%/89.97%, respectively.
منابع مشابه
Applying a Mix Word-Pair Identifier to the Chinese Syllable-to-Word Conversion Problem
This paper describes a mix word-pair mix-WP) identifier to resolve homonym/segmentation ambiguities as well as perform STW conversion effectively for Chinese input. The mix-WP identifier includes a specific word-pair (SWP) identifier and a common wordpair (CWP) identifier. It is designed as a supporting processing with Chinese input systems. Our experiments show that by applying the mix-WP iden...
متن کاملApplying an NVEF Word-Pair Identifier to the Chinese Syllable-to-Word Conversion Problem
Syllable-to-word (STW) conversion is important in Chinese phonetic input methods and speech recognition. There are two major problems in the STW conversion: (1) resolving the ambiguity caused by homonyms; (2) determining the word segmentation. This paper describes a noun-verb event-frame (NVEF) word identifier that can be used to solve these problems effectively. Our approach includes (a) an NV...
متن کاملUsing Word-Pair Identifier to Improve Chinese Input System
This paper presents a word-pair (WP) identifier that can be used to resolve homonym/segmentation ambiguities and perform syllable-to-word (STW) conversion effectively for improving Chinese input systems. The experiment results show the following: (1) the WP identifier is able to achieve tonal (syllables with four tones) and toneless (syllables without four tones) STW accuracies of 98.5% and 90....
متن کاملApplying Word Pair Model to the Chinese Syllable-to-Word Problem
Syllable-to-word (STW) conversion is a main task of Chinese Language Processing and a fundamental to syllable/speech understanding. The two major problems of STW conversion are syllable-word segmentation and homophone selection. This paper presents a word pair model (WPM) that can effectively perform homophone selection and syllable-word segmentation to improve Chinese input systems. The STW ex...
متن کاملAuto-Discovery of NVEF Word-Pairs in Chinese
A meaningful noun-verb word-pair in a sentence is called a noun-verb event-frame (NVFE). Previously, we have developed an NVEF word-pair identifier to demonstrate that NVEF knowledge can be used effectively to resolve the Chinese word-sense disambiguation (WSD) problem (with 93.7% accuracy) and the Chinese syllable-to-word (STW) conversion problem (with 99.66% accuracy) on the NVEF related port...
متن کامل